Overview

Dataset statistics

Number of variables14
Number of observations284104
Missing cells44652
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory30.3 MiB
Average record size in memory112.0 B

Variable types

Categorical4
DateTime1
Numeric9

Alerts

VERSIE has constant value "1.0" Constant
DATUM_BESTAND has constant value "2021-10-18" Constant
PEILDATUM has constant value "2021-10-01" Constant
TYPERENDE_DIAGNOSE_CD has a high cardinality: 1770 distinct values High cardinality
BEHANDELEND_SPECIALISME_CD is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with BEHANDELEND_SPECIALISME_CD and 1 other fieldsHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with AANTAL_SUBTRAJECT_PER_SPCHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with AANTAL_SUBTRAJECT_PER_SPCHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
VERSIE is highly correlated with DATUM_BESTAND and 1 other fieldsHigh correlation
DATUM_BESTAND is highly correlated with VERSIE and 1 other fieldsHigh correlation
PEILDATUM is highly correlated with VERSIE and 1 other fieldsHigh correlation
JAAR is highly correlated with AANTAL_PAT_PER_SPC and 1 other fieldsHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with JAAR and 1 other fieldsHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with JAAR and 1 other fieldsHigh correlation
GEMIDDELDE_VERKOOPPRIJS has 44652 (15.7%) missing values Missing
AANTAL_SUBTRAJECT_PER_ZPD is highly skewed (γ1 = 21.38670207) Skewed

Reproduction

Analysis started2021-11-03 21:49:06.848770
Analysis finished2021-11-03 21:49:31.660005
Duration24.81 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

VERSIE
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
1.0
284104 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters852312
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0284104
100.0%

Length

2021-11-03T21:49:31.715985image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-03T21:49:31.940934image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0284104
100.0%

Most occurring characters

ValueCountFrequency (%)
1284104
33.3%
.284104
33.3%
0284104
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number568208
66.7%
Other Punctuation284104
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1284104
50.0%
0284104
50.0%
Other Punctuation
ValueCountFrequency (%)
.284104
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common852312
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1284104
33.3%
.284104
33.3%
0284104
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII852312
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1284104
33.3%
.284104
33.3%
0284104
33.3%

DATUM_BESTAND
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
2021-10-18
284104 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters2841040
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021-10-18
2nd row2021-10-18
3rd row2021-10-18
4th row2021-10-18
5th row2021-10-18

Common Values

ValueCountFrequency (%)
2021-10-18284104
100.0%

Length

2021-11-03T21:49:32.006212image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-03T21:49:32.074190image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
2021-10-18284104
100.0%

Most occurring characters

ValueCountFrequency (%)
1852312
30.0%
2568208
20.0%
0568208
20.0%
-568208
20.0%
8284104
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2272832
80.0%
Dash Punctuation568208
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1852312
37.5%
2568208
25.0%
0568208
25.0%
8284104
 
12.5%
Dash Punctuation
ValueCountFrequency (%)
-568208
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2841040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1852312
30.0%
2568208
20.0%
0568208
20.0%
-568208
20.0%
8284104
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2841040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1852312
30.0%
2568208
20.0%
0568208
20.0%
-568208
20.0%
8284104
 
10.0%

PEILDATUM
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
2021-10-01
284104 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters2841040
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021-10-01
2nd row2021-10-01
3rd row2021-10-01
4th row2021-10-01
5th row2021-10-01

Common Values

ValueCountFrequency (%)
2021-10-01284104
100.0%

Length

2021-11-03T21:49:32.140197image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-03T21:49:32.207979image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
2021-10-01284104
100.0%

Most occurring characters

ValueCountFrequency (%)
0852312
30.0%
1852312
30.0%
2568208
20.0%
-568208
20.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2272832
80.0%
Dash Punctuation568208
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0852312
37.5%
1852312
37.5%
2568208
25.0%
Dash Punctuation
ValueCountFrequency (%)
-568208
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2841040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0852312
30.0%
1852312
30.0%
2568208
20.0%
-568208
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2841040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0852312
30.0%
1852312
30.0%
2568208
20.0%
-568208
20.0%

JAAR
Date

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
Minimum2012-01-01 00:00:00
Maximum2021-01-01 00:00:00
2021-11-03T21:49:32.264576image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:32.357271image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)

BEHANDELEND_SPECIALISME_CD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean424.1597936
Minimum301
Maximum8418
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-11-03T21:49:32.474134image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum301
5-th percentile302
Q1305
median313
Q3322
95-th percentile335
Maximum8418
Range8117
Interquartile range (IQR)17

Descriptive statistics

Standard deviation931.4989193
Coefficient of variation (CV)2.196103764
Kurtosis69.51143197
Mean424.1597936
Median Absolute Deviation (MAD)8
Skewness8.449858075
Sum120505494
Variance867690.2367
MonotonicityNot monotonic
2021-11-03T21:49:32.592250image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
30540296
14.2%
31336877
13.0%
30332738
11.5%
33022673
 
8.0%
31619328
 
6.8%
30814804
 
5.2%
30611837
 
4.2%
32411824
 
4.2%
30111518
 
4.1%
3049303
 
3.3%
Other values (17)72906
25.7%
ValueCountFrequency (%)
30111518
 
4.1%
3026212
 
2.2%
30332738
11.5%
3049303
 
3.3%
30540296
14.2%
30611837
 
4.2%
3074935
 
1.7%
30814804
 
5.2%
3103176
 
1.1%
31336877
13.0%
ValueCountFrequency (%)
84183798
 
1.3%
1900186
 
0.1%
390759
 
0.3%
3893054
 
1.1%
3624034
 
1.4%
3612018
 
0.7%
3352917
 
1.0%
33022673
8.0%
329751
 
0.3%
3286054
 
2.1%

TYPERENDE_DIAGNOSE_CD
Categorical

HIGH CARDINALITY

Distinct1770
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
101
 
1201
402
 
1172
403
 
1141
301
 
1133
203
 
1073
Other values (1765)
278384 

Length

Max length4
Median length3
Mean length3.349329823
Min length2

Characters and Unicode

Total characters951558
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row404
2nd row112
3rd row707
4th row111
5th row110

Common Values

ValueCountFrequency (%)
1011201
 
0.4%
4021172
 
0.4%
4031141
 
0.4%
3011133
 
0.4%
2031073
 
0.4%
2011069
 
0.4%
401953
 
0.3%
404951
 
0.3%
802929
 
0.3%
409926
 
0.3%
Other values (1760)273556
96.3%

Length

2021-11-03T21:49:32.724668image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1011201
 
0.4%
4021172
 
0.4%
4031141
 
0.4%
3011133
 
0.4%
2031073
 
0.4%
2011069
 
0.4%
401953
 
0.3%
404951
 
0.3%
802929
 
0.3%
409926
 
0.3%
Other values (1760)273556
96.3%

Most occurring characters

ValueCountFrequency (%)
1182186
19.1%
0173922
18.3%
2126036
13.2%
3103337
10.9%
573119
7.7%
968861
 
7.2%
467806
 
7.1%
756054
 
5.9%
649706
 
5.2%
840880
 
4.3%
Other values (15)9651
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number941907
99.0%
Uppercase Letter9651
 
1.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
G1807
18.7%
M1602
16.6%
B1158
12.0%
E827
8.6%
Z777
8.1%
D656
 
6.8%
A631
 
6.5%
F611
 
6.3%
C320
 
3.3%
K310
 
3.2%
Other values (5)952
9.9%
Decimal Number
ValueCountFrequency (%)
1182186
19.3%
0173922
18.5%
2126036
13.4%
3103337
11.0%
573119
7.8%
968861
 
7.3%
467806
 
7.2%
756054
 
6.0%
649706
 
5.3%
840880
 
4.3%

Most occurring scripts

ValueCountFrequency (%)
Common941907
99.0%
Latin9651
 
1.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
G1807
18.7%
M1602
16.6%
B1158
12.0%
E827
8.6%
Z777
8.1%
D656
 
6.8%
A631
 
6.5%
F611
 
6.3%
C320
 
3.3%
K310
 
3.2%
Other values (5)952
9.9%
Common
ValueCountFrequency (%)
1182186
19.3%
0173922
18.5%
2126036
13.4%
3103337
11.0%
573119
7.8%
968861
 
7.3%
467806
 
7.2%
756054
 
6.0%
649706
 
5.3%
840880
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII951558
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1182186
19.1%
0173922
18.3%
2126036
13.2%
3103337
10.9%
573119
7.7%
968861
 
7.2%
467806
 
7.1%
756054
 
5.9%
649706
 
5.2%
840880
 
4.3%
Other values (15)9651
 
1.0%

ZORGPRODUCT_CD
Real number (ℝ≥0)

Distinct5937
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean439888418.7
Minimum10501002
Maximum998418081
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-11-03T21:49:32.859561image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum10501002
5-th percentile28999037
Q199799028
median149599021.5
Q3990004004
95-th percentile990516014
Maximum998418081
Range987917079
Interquartile range (IQR)890204976

Descriptive statistics

Standard deviation428883557.2
Coefficient of variation (CV)0.974982607
Kurtosis-1.733266876
Mean439888418.7
Median Absolute Deviation (MAD)119600015.5
Skewness0.4718533774
Sum1.249740593 × 1014
Variance1.839411056 × 1017
MonotonicityNot monotonic
2021-11-03T21:49:33.004261image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9900040092102
 
0.7%
9900040072060
 
0.7%
9900030042000
 
0.7%
9900040061659
 
0.6%
9903560761499
 
0.5%
9903560731372
 
0.5%
9900030071300
 
0.5%
1319992281271
 
0.4%
1319991641253
 
0.4%
1992990131196
 
0.4%
Other values (5927)268392
94.5%
ValueCountFrequency (%)
105010027
< 0.1%
1050100310
< 0.1%
1050100410
< 0.1%
1050100510
< 0.1%
105010073
 
< 0.1%
1050100810
< 0.1%
1050101010
< 0.1%
105010113
 
< 0.1%
111010029
< 0.1%
1110100310
< 0.1%
ValueCountFrequency (%)
998418081136
< 0.1%
998418080122
< 0.1%
99841807935
 
< 0.1%
9984180777
 
< 0.1%
9984180767
 
< 0.1%
9984180756
 
< 0.1%
998418074188
0.1%
998418073187
0.1%
9984180727
 
< 0.1%
9984180717
 
< 0.1%

AANTAL_PAT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9379
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean505.127133
Minimum1
Maximum164407
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-11-03T21:49:33.146716image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median13
Q3101
95-th percentile1703
Maximum164407
Range164406
Interquartile range (IQR)98

Descriptive statistics

Standard deviation3144.476564
Coefficient of variation (CV)6.2251191
Kurtosis406.3582677
Mean505.127133
Median Absolute Deviation (MAD)12
Skewness16.75970234
Sum143508639
Variance9887732.859
MonotonicityNot monotonic
2021-11-03T21:49:33.282534image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
147157
 
16.6%
223123
 
8.1%
315083
 
5.3%
411141
 
3.9%
58630
 
3.0%
67271
 
2.6%
76054
 
2.1%
85113
 
1.8%
94707
 
1.7%
104165
 
1.5%
Other values (9369)151660
53.4%
ValueCountFrequency (%)
147157
16.6%
223123
8.1%
315083
 
5.3%
411141
 
3.9%
58630
 
3.0%
67271
 
2.6%
76054
 
2.1%
85113
 
1.8%
94707
 
1.7%
104165
 
1.5%
ValueCountFrequency (%)
1644071
< 0.1%
1558711
< 0.1%
1542721
< 0.1%
1479611
< 0.1%
1447261
< 0.1%
1173831
< 0.1%
1156061
< 0.1%
1102081
< 0.1%
1096771
< 0.1%
1089591
< 0.1%

AANTAL_SUBTRAJECT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct10080
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean593.6964386
Minimum1
Maximum239907
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-11-03T21:49:33.422890image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median14
Q3110
95-th percentile1932.85
Maximum239907
Range239906
Interquartile range (IQR)107

Descriptive statistics

Standard deviation4014.367424
Coefficient of variation (CV)6.761649831
Kurtosis729.4669142
Mean593.6964386
Median Absolute Deviation (MAD)13
Skewness21.38670207
Sum168671533
Variance16115145.81
MonotonicityNot monotonic
2021-11-03T21:49:33.566980image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
145470
 
16.0%
222731
 
8.0%
314947
 
5.3%
410939
 
3.9%
58566
 
3.0%
67242
 
2.5%
76020
 
2.1%
85048
 
1.8%
94688
 
1.7%
104131
 
1.5%
Other values (10070)154322
54.3%
ValueCountFrequency (%)
145470
16.0%
222731
8.0%
314947
 
5.3%
410939
 
3.9%
58566
 
3.0%
67242
 
2.5%
76020
 
2.1%
85048
 
1.8%
94688
 
1.7%
104131
 
1.5%
ValueCountFrequency (%)
2399071
< 0.1%
2324841
< 0.1%
2321771
< 0.1%
2281461
< 0.1%
2276581
< 0.1%
2238361
< 0.1%
2211651
< 0.1%
2186231
< 0.1%
2137901
< 0.1%
2047491
< 0.1%

AANTAL_PAT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct8318
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7590.617373
Minimum1
Maximum227300
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-11-03T21:49:33.706377image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile40
Q1390
median1676
Q36219
95-th percentile36289
Maximum227300
Range227299
Interquartile range (IQR)5829

Descriptive statistics

Standard deviation17745.88311
Coefficient of variation (CV)2.337870852
Kurtosis34.1989451
Mean7590.617373
Median Absolute Deviation (MAD)1529
Skewness5.085166272
Sum2156524758
Variance314916367.2
MonotonicityNot monotonic
2021-11-03T21:49:33.855371image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21450
 
0.2%
9437
 
0.2%
8425
 
0.1%
19420
 
0.1%
25412
 
0.1%
37410
 
0.1%
28407
 
0.1%
12400
 
0.1%
14400
 
0.1%
6396
 
0.1%
Other values (8308)279947
98.5%
ValueCountFrequency (%)
1340
0.1%
2357
0.1%
3356
0.1%
4383
0.1%
5360
0.1%
6396
0.1%
7352
0.1%
8425
0.1%
9437
0.2%
10329
0.1%
ValueCountFrequency (%)
22730023
< 0.1%
21351025
< 0.1%
21264017
< 0.1%
21080517
< 0.1%
21044019
< 0.1%
20967124
< 0.1%
20467217
< 0.1%
20017816
< 0.1%
19853320
< 0.1%
18911119
< 0.1%

AANTAL_SUBTRAJECT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9177
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10828.85059
Minimum1
Maximum367763
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-11-03T21:49:34.150614image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile50
Q1510
median2301
Q38851
95-th percentile51242
Maximum367763
Range367762
Interquartile range (IQR)8341

Descriptive statistics

Standard deviation26196.719
Coefficient of variation (CV)2.419159706
Kurtosis38.24225139
Mean10828.85059
Median Absolute Deviation (MAD)2114
Skewness5.352422043
Sum3076519767
Variance686268086.2
MonotonicityNot monotonic
2021-11-03T21:49:34.293021image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
25350
 
0.1%
17349
 
0.1%
34345
 
0.1%
31336
 
0.1%
38334
 
0.1%
6333
 
0.1%
10333
 
0.1%
13333
 
0.1%
4331
 
0.1%
46328
 
0.1%
Other values (9167)280732
98.8%
ValueCountFrequency (%)
1279
0.1%
2293
0.1%
3302
0.1%
4331
0.1%
5303
0.1%
6333
0.1%
7317
0.1%
8283
0.1%
9253
0.1%
10333
0.1%
ValueCountFrequency (%)
36776323
< 0.1%
34846025
< 0.1%
34170819
< 0.1%
32768224
< 0.1%
32379920
< 0.1%
31287917
< 0.1%
30971417
< 0.1%
29771617
< 0.1%
28841516
< 0.1%
26704219
< 0.1%

AANTAL_PAT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct269
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean663146.4615
Minimum458
Maximum1489502
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-11-03T21:49:34.443602image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum458
5-th percentile43282
Q1254267
median733987
Q31005769
95-th percentile1333993
Maximum1489502
Range1489044
Interquartile range (IQR)751502

Descriptive statistics

Standard deviation423139.1409
Coefficient of variation (CV)0.6380779593
Kurtosis-1.180388968
Mean663146.4615
Median Absolute Deviation (MAD)340129
Skewness0.04390350245
Sum1.884025623 × 1011
Variance1.790467326 × 1011
MonotonicityNot monotonic
2021-11-03T21:49:34.587925image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8809695102
 
1.8%
8742844354
 
1.5%
8439904348
 
1.5%
8944104333
 
1.5%
8805694273
 
1.5%
8930714210
 
1.5%
7339874048
 
1.4%
10841733890
 
1.4%
10997523862
 
1.4%
10636813851
 
1.4%
Other values (259)241833
85.1%
ValueCountFrequency (%)
45860
 
< 0.1%
1562125
 
< 0.1%
1610130
 
< 0.1%
1923131
 
< 0.1%
2497173
0.1%
3187239
0.1%
367567
 
< 0.1%
500681
 
< 0.1%
6811380
0.1%
7049331
0.1%
ValueCountFrequency (%)
14895022976
1.0%
14506213054
1.1%
14218473564
1.3%
13452293543
1.2%
13339933436
1.2%
13328813546
1.2%
13173773463
1.2%
12967221181
 
0.4%
12830823577
1.3%
12625911201
 
0.4%

AANTAL_SUBTRAJECT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct269
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1061410.007
Minimum480
Maximum2660129
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-11-03T21:49:34.737311image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum480
5-th percentile47968
Q1364428
median1041762
Q31729108
95-th percentile2488652
Maximum2660129
Range2659649
Interquartile range (IQR)1364680

Descriptive statistics

Standard deviation744078.5251
Coefficient of variation (CV)0.7010283682
Kurtosis-0.8847560124
Mean1061410.007
Median Absolute Deviation (MAD)687346
Skewness0.3513993003
Sum3.015508286 × 1011
Variance5.536528515 × 1011
MonotonicityNot monotonic
2021-11-03T21:49:34.889588image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12118135102
 
1.8%
12817474354
 
1.5%
12162944348
 
1.5%
13157204333
 
1.5%
13006214273
 
1.5%
13325474210
 
1.5%
10987224048
 
1.4%
25580083890
 
1.4%
26601293862
 
1.4%
24886523851
 
1.4%
Other values (259)241833
85.1%
ValueCountFrequency (%)
48060
 
< 0.1%
1773125
 
< 0.1%
1863130
 
< 0.1%
2200131
 
< 0.1%
2819173
0.1%
3366239
0.1%
376167
 
< 0.1%
503781
 
< 0.1%
7180331
0.1%
7390380
0.1%
ValueCountFrequency (%)
26601293862
1.4%
26036923845
1.4%
25580083890
1.4%
24886523851
1.4%
24816433726
1.3%
21844173757
1.3%
20663423810
1.3%
20359191169
 
0.4%
19854901167
 
0.4%
19785523691
1.3%

GEMIDDELDE_VERKOOPPRIJS
Real number (ℝ≥0)

MISSING

Distinct3295
Distinct (%)1.4%
Missing44652
Missing (%)15.7%
Infinite0
Infinite (%)0.0%
Mean3498.83887
Minimum0
Maximum287220
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-11-03T21:49:35.035419image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile140
Q1460
median1215
Q34015
95-th percentile13275
Maximum287220
Range287220
Interquartile range (IQR)3555

Descriptive statistics

Standard deviation6532.291406
Coefficient of variation (CV)1.86698835
Kurtosis162.8873368
Mean3498.83887
Median Absolute Deviation (MAD)990
Skewness7.658529223
Sum837803965
Variance42670831.02
MonotonicityNot monotonic
2021-11-03T21:49:35.175718image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1601875
 
0.7%
1051859
 
0.7%
1101595
 
0.6%
1801360
 
0.5%
1451359
 
0.5%
3001304
 
0.5%
1901256
 
0.4%
1651223
 
0.4%
1401200
 
0.4%
1851189
 
0.4%
Other values (3285)225232
79.3%
(Missing)44652
 
15.7%
ValueCountFrequency (%)
02
 
< 0.1%
70226
 
0.1%
7576
 
< 0.1%
80362
 
0.1%
85920
0.3%
90602
 
0.2%
95659
 
0.2%
100930
0.3%
1051859
0.7%
1101595
0.6%
ValueCountFrequency (%)
2872208
< 0.1%
1489103
 
< 0.1%
1428354
< 0.1%
1221554
< 0.1%
1167653
 
< 0.1%
1097257
< 0.1%
1085707
< 0.1%
1076554
< 0.1%
1012708
< 0.1%
954657
< 0.1%

Interactions

2021-11-03T21:49:28.419632image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:15.334389image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:16.965898image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:18.534046image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:20.267108image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:21.818548image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:23.512004image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:25.137482image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:26.713417image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:28.600935image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:15.527731image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:17.148185image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:18.715293image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:20.446368image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:21.997621image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:23.699809image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:25.321192image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:26.893517image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:28.769895image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:15.706609image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:17.319861image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:19.051258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:20.615474image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:22.165016image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:23.876661image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:25.494518image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:27.220834image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:28.939243image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:15.888277image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:17.494313image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:19.224075image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:20.786711image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:22.335676image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:24.058621image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:25.669510image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:27.392308image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:29.107364image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:16.065122image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:17.666435image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:19.397387image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:20.955128image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:22.503621image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:24.236920image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:25.842508image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:27.562538image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:29.270752image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:16.237956image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:17.832327image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:19.563606image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:21.119074image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:22.668082image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:24.410135image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:26.010917image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:27.727842image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:29.446204image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:16.422244image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:18.014442image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:19.742125image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:21.296129image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:22.844214image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:24.596712image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:26.189146image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:27.908965image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:29.620805image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:16.607838image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:18.194085image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:19.922598image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:21.472966image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:23.177405image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:24.784168image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:26.367851image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:28.083933image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:29.786827image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:16.781602image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:18.360666image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:20.090240image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:21.638053image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:23.341568image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:24.958222image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:26.537198image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-03T21:49:28.250363image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-11-03T21:49:35.310675image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-11-03T21:49:35.525202image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-11-03T21:49:35.739189image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-11-03T21:49:35.938919image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-11-03T21:49:36.074394image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-11-03T21:49:30.064425image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-11-03T21:49:30.580392image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-11-03T21:49:31.424328image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
01.02021-10-182021-10-012015-01-013244041319992081051105316421850248748378596430.0
11.02021-10-182021-10-012015-01-013241121319991186615922798248748378596975.0
21.02021-10-182021-10-012015-01-0132470713199920780801045811442248748378596575.0
31.02021-10-182021-10-012015-01-0132411113199911733122163248748378596835.0
41.02021-10-182021-10-012015-01-013241101319990221125292487483785966700.0
51.02021-10-182021-10-012015-01-0132439913199915616318021183198248748378596680.0
61.02021-10-182021-10-012015-01-013248031319990221919258528822487483785966700.0
71.02021-10-182021-10-012015-01-01324107131999207444710081534248748378596575.0
81.02021-10-182021-10-012015-01-0132430413199920732357451215248748378596575.0
91.02021-10-182021-10-012015-01-0132420213199911911119818112487483785961460.0

Last rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
2840941.02021-10-182021-10-012018-01-013270413990027168130016298996150992000533713293165.0
2840951.02021-10-182021-10-012018-01-0132705129900272091122924421200053371329NaN
2840961.02021-10-182021-10-012018-01-013270415990027131494924474284200053371329165.0
2840971.02021-10-182021-10-012018-01-0132701189900271986258879301533200053371329220.0
2840981.02021-10-182021-10-012018-01-013270212990027198132196170325200053371329220.0
2840991.02021-10-182021-10-012018-01-01327071699002719952857023783807200053371329845.0
2841001.02021-10-182021-10-012018-01-013270614990027180336811067200053371329NaN
2841011.02021-10-182021-10-012018-01-013270117990027135775781981920005337132940545.0
2841021.02021-10-182021-10-012018-01-0132706159900271851719996154720005337132914135.0
2841031.02021-10-182021-10-012018-01-013270315990027152881075206220005337132974370.0